Parallel WaveNet: Fast High-Fidelity Speech Synthesis

نویسندگان

  • Aäron van den Oord
  • Yazhe Li
  • Igor Babuschkin
  • Karen Simonyan
  • Oriol Vinyals
  • Koray Kavukcuoglu
  • George van den Driessche
  • Edward Lockhart
  • Luis C. Cobo
  • Florian Stimberg
  • Norman Casagrande
  • Dominik Grewe
  • Seb Noury
  • Sander Dieleman
  • Erich Elsen
  • Nal Kalchbrenner
  • Heiga Zen
  • Alex Graves
  • Helen King
  • Tom Walters
  • Dan Belov
  • Demis Hassabis
چکیده

The recently-developed WaveNet architecture [27] is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-...

متن کامل

Speaker-Dependent WaveNet Vocoder

In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as auxiliary features of WaveNet. It is expected that WaveNet can learn a sample-by-sample correspondence between speech waveform and acoustic features. The advantage of the proposed method is that it does not require (1) exp...

متن کامل

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinio...

متن کامل

Hybridnet: a Hybrid Neural Architecture to Speed-up Autoregressive Models

This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation. As an example, we propose a hybrid model that combines an autoregressive network named WaveNet and a conventional LSTM model to address speech synthesis. Instead of generating one sample per time-step, the proposed HybridNet generates multiple samples per time-step by ex...

متن کامل

Text-to-speech Synthesis System based on Wavenet

In this project, we focus on building a novel parametric TTS system. Our model is based on WaveNet(Oord et al, 2016), a deep neural network introduced by DeepMind in late 2016 for generating raw audio waveforms. It is fully probabilistic, with the predictive distribution for each audio sample conditioned on all previous samples. The model introduces the idea of convolutional layer into TTS task...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.10433  شماره 

صفحات  -

تاریخ انتشار 2017